Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper)

نویسندگان

  • Brendan Cully
  • Geoffrey Lefebvre
  • Dutch T. Meyer
  • Michael J. Feeley
  • Norman C. Hutchinson
  • Andrew Warfield
چکیده

Allowing applications to survive hardware failure is an expensive undertaking, which generally involves reengineering software to include complicated recovery logic as well as deploying special-purpose hardware; this represents a severe barrier to improving the dependability of large or legacy applications. We describe the construction of a general and transparent high availability service that allows existing, unmodified software to be protected from the failure of the physical machine on which it runs. Remus provides an extremely high degree of fault tolerance, to the point that a running system can transparently continue execution on an alternate physical host in the face of failure with only seconds of downtime, while completely preserving host state such as active network connections. Our approach encapsulates protected software in a virtual machine, asynchronously propagates changed state to a backup host at frequencies as high as forty times a second, and uses speculative execution to concurrently run the active VM slightly ahead of the replicated system state.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing the Performance of High Availability Lightweight Live Migration

Remus was the first system which implemented whole virtual machine replication to achieve high availability (HA). Recently a fast, lightweight migration mechanism (LLM) was proposed to reduce the long network delay in Remus. However, all these virtualized systems have the long downtime problem, which becomes a bottleneck to achieve HA. Based on LLM, in this paper we describe a fine-grained bloc...

متن کامل

Lightweight Live Migration for High Availability

High availability is a critical feature for service clusters and cloud computing, and is often considered more valuable than performance. One commonly used technique to enhance the availability is live migration, which replicates services based on virtualization technology. However, continuous live migration with checkpointing will introduce significant overhead. In this paper, we present a lig...

متن کامل

Resilire: Achieving High Availability at the Virtual Machine Level

(ABSTRACT) High availability is a critical feature of data centers, cloud, and cluster computing environments. Replication is a classical approach to increase service availability by providing redundancy. However , traditional replication methods are increasingly unattractive for deployment due to several limitations such as application-level non-transparency, non-isolation of applications (cau...

متن کامل

Towards Superclouds

ion Level Feature Existing Clouds Application monitoring Auto-scaling Non-live migration Mutable Hypervisor Page sharing [81, 145] Overdriver [154] Revirt [65] Remus [61] Live migration [57] Cross-provider live migration Exposed Hardware vSnoop [92] Superpages [121] Page coloring [93] Non fate-sharing Unsupported paravirtualization Table 3.1: Cloud abstractions and extensions they enable maxima...

متن کامل

Continuous Performance Analysis of Fault-Tolerant Virtual Machines

Virtual machine technology has been successfully applied for the construction of fault-tolerant computing systems. For example, vmware Fault Tolerance and Xen Remus support transparent failover of VMs running on different physical machines in a local area network. However, high availability alone is in many application domains not sufficient. Especially in the context of Cyber-Physical Systems,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008